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File: USPT 



Aug 3, 1999 



DOCUMENT-IDENTIFIER: US 5933822 A 

TITLE: Apparatus and methods for an information retrieval system that employs 
natural language processing of search results to improve overall precision 



Brief Summary Text (6) : 

However, with the advent and proliferation of the so-called " world-wide 
web " (hereinafter simply referred to as the "web") accessible through the Internet 
and the relative ease and low-cost associated with posting information to the web 
and accessing information therefrom as contrasted with traditional publishing, the 
amount of information available on the web manifests highly exponential, if not 
explosive, growth, with apparently no realistic limit in sight. While the web 
offers an increasingly rich array of information across all disciplines of human 
endeavor, information content on the web is highly chaotic and extremely 
disorganized, which severely complicates and often frustrates information access 
and retrieval therefrom. 

Brief Summary Text (9) : 

Consequently, to reduce the number of irrelevant documents that are retrieved, 
conventional keyword based search engines (hereinafter referred to as simply 
"statistical search engines") incorporate statistical processing into their search 
methodologies. For example, based on a total number of matching key words between 
those in the query and the content words in each retrieved document record and how 
well these words match, i.e., in the combination and/or within a proximity range 
requested, a statistical search engine calculates numeric measures, collectively 
frequently referred to as "statistics", for each such document record retrieved. 
These statistics may include an inverse document frequency for each matching word. 
The engine then ranks the document records in terms of their statistics and returns 
to the user the document records for a small predefined number of retrieved 
records, typically 5-20 or less, that have the highest rankings. Once the user has 
reviewed a first group of document records (or, for some engines, the documents 
themselves if they are returned by the engine) for a first group of retrieved 
documents, the user can then request a next group of document records having the 
next highest rankings, and so forth until all the retrieved document records have 
been so reviewed. 

Brief Summary Text (16) : 

A further syntactic-based approach of this sort is described in B. Katz, 
"Annotating the World Wide Web using Natural Language", Conference Proceedings of 
RIAO 97, Computer-Assisted Information Searching in Internet, McGill University, 
Quebec, Canada, Jun. 25-27, 1997, Vol. 1, pages 136-155 [hereinafter the "Katz. 
publication"] . As described in the Katz publication, subject-verb-object 
expressions are created while preserving the internal structure so that during 
retrieval minor syntactic alternations can be accommodated. 

Brief Summary Text (23) : 

In accordance with our specific teachings, such a search ultimately yields a set of 
retrieved documents from, e.g. a database or the world wide web . Each document is 
then subjected to natural language processing, specifically morphological, 
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syntactic and logical form, to ultimately produce appropriate logical forms for 
each sentence in each document. A user-supplied query is analyzed in the same 
manner to yield a set of corresponding logical form triples therefor. The set of 
logical forms for the query is then compared to the sets of logical forms 
associated with each of the retrieved documents in order to ascertain a match 
between logical forms from the query set and logical forms from each document set. 
Documents that produce no matches are eliminated from further consideration. Each 
remaining document is then heuristically scored. In particular, each different 
relation type, i.e., such as deep subject, deep object, operator and the like, that 
can occur in a logical form is assigned a predefined weight. The score of each such 
remaining document is a predefined function of the weights of the matching logical 
forms therein. This function may be, e.g., a sum of the weights associated with all 
unique matching triples (duplicate matches being ignored) which occur in that 
document. Finally, the retained documents are then presented to a user in 
descending rank order based on their scores, typically in groups of a small 
predefined number of, e.g. five or ten, documents starting with the group having 
the highest scores, then followed, in descending rank order, by other groups in 
succession, as the user so selects. 

Detailed Description Text (7) : 

Inasmuch as system 5 is very general purpose and can be adapted to a wide range of 
different applications, then, to simplify the following discussion, we will discuss 
use of our invention in one illustrative context. That context will be an 
information retrieval system that employs a conventional keyword based statistical 
Internet search engine to retrieve stored records of English-language documents 
indexed into a dataset from the world wide web . Each such record generally contains 
predefined information, as set forth below, for a corresponding document. For other 
search engines, the record may contain the entire document itself. Though the 
following discussion addresses our invention in the context of use with a 
conventional Internet search engine that retrieves a record containing certain 
information about a corresponding document including a web address at which that 
document can be found, generically speaking, the ultimate item retrieved by that 
engine is, in fact, the document, even though an intermediate process, using that 
address, is generally employed to actually access the document from the web. After 
considering the following description, those skilled in the art will readily 
appreciate how our present invention can be easily adapted for use in any other 
information retrieval application. 

Detailed Description Text (8) : 

FIG. 2 depicts a high-level block diagram of a particular embodiment of our 
invention used in the context of an Internet search engine. Our invention will 
principally be discussed in detail in the context of this particular embodiment. As 
shown, system 200 contains computer system 300, such as a client personal computer 
(PC), connected, via network connection 205, through network 210 (here the 
Internet, though any other such network, e.g. an intranet, could be alternatively 
used), and network connection 215, to server 220. The server typically contains 
computer 222 which hosts Internet search engine 225, typified by, e.g., the ALT A 
VISTA search engine (ALTA VISTA is a registered trademark of Digital Equipment 
Corporation of Maynard, Mass.) and is connected to mass data store 227, typically a 
dataset of document records indexed by the search engine and accessible through the 
World Wide Web on the Internet. Each such record typically contains: (a) a web 
address (commonly referred to as a uniform resource locator — URL) at which a 
corresponding document can be accessed by a web browser, (b) predefined content 
words which appear in that document, along with, in certain engines, a relative 
address of each such word relative to other content words in that document; (c) a 
short summary, often just a few lines, of the document or a first few lines of the 
document; and possibly (d) a description of the document as provided in its 
hypertext markup language (HTML) description field. 

Other Reference Publication (1) : 
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B. Katz, "Annotating the World Wide Web using Natural Language", Conference 
Proceedings of RIAO 97, Computer-Assisted Information Searching in Internet, McGill 
University, Quebec, Canada, Jun. 25-27 1997, vol. 1, pp. 136-155. 

Other Reference Publication ( 9) : 

0. Etzoni, "The World-Wide Web : Quagmire or Gold Mine", Communications of the ACM, 
Nov. 1996, vol. 39, No. 11, pp. 65-68. 
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File: USPT 



Aug 28, 2001 



DOCUMENT-IDENTIFIER: US 6282543 Bl 

TITLE: Database search and display method and database search system 
Brief Summary Text (6) : 

There has been a technique for searching databases distributed on a network and 
displaying retrieved information. The WWW (World Wide Web ) utilizing the Internet 
is an example of such technique, with which a user can easily and inexpensively 
obtain a desired information, and is becoming popular worldwide rapidly. 

Brief Summary Text (7) : 

In the WWW, search software for performing a keyword search such as "Yahoo!" is 
well known and widely used as directory services for notifying database to be 



Brief Summary Text (12) : 

FIG. 1 shows an example in which this technique is applied to the Internet. In FIG. 
1, a directory information provider 30 stores a scenario information 40 composed of 
URL and information assigning a time interval of transmission of URL. The directory 
information provider 30 transmits URL arranged on a time axis described in the 
scenario information 40 to a database searcher 20 according to the assigned time 
interval. In the database 20, a browser controller 22 receives URL from the 
directory information provider 30 and outputs it to a WWW browser 21. The WWW 
browser 21 accesses a corresponding WWW server 61, 62, . . . , or 6n on the 
Internet on the basis of the input URL, down-loads a home page of the assigned URL 
and displays it on the WWW browser. 

Brief Summary Text (14): 

As mentioned above, according to the database search method utilizing WWW and 
device therefor proposed by the inventors, the user can watch and hear the 
information of the home page sent from the WWW server by watching a screen of the 
WWW browser as if he watches a television screen. 

Brief Summary Text (15): 

That is, according to the database search method utilizing WWW and device therefor 
proposed by the inventors, the user can watch a screen of the WWW browser as if he 
watches a television screen and the information provider can show the user an 
information which the information 'provider wants to show in a sequence desired by 
the information provider. 

Brief Summary Text (36) : 

A fifth aspect of the present invention is an application of the present invention 
to .the database search system in the Internet. The database search system comprises 
a World Wide Web database provider connected to the Internet, a directory 
information provider for retrieving a position information of data on the Internet 
and providing a directory information thereof and a database searcher for acquiring 
data of the database provider and displaying the data of the database provider to a 
user on the basis of the directory information of the directory information 
provider. The directory information provider provides scenario information composed 
of main information presenting sequence control information for arranging position 
information of the main information to be output/displayed to a user of the 
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database searcher on a time axis, secondary information presenting sequence control 
information for inserting a secondary information into the main information and 
outputting/displaying it and presenting control information for controlling 
insertion timing of the secondary information into the main information. The 
database searcher comprises means for inserting the secondary information into the 
main information on the basis of the scenario information of the directory 
information provider and outputting/displaying the secondary information. 

Detailed Description Text (3) : 

FIG. 3 shows an example of a system construction of a database search system 
according to the present invention in which the present invention is applied to a 
case where a WWW (World Wide Web ) server is searched by utilizing WWW browsers on 
an Internet. 

Detailed Description Text (4 ) : 

The database search system comprises WWW servers 61, 62, . . . , 6n which are WWW 
database providers connected to the Internet 50, a directory information provider 
30 for searching position information of data on the Internet and providing 
directory information thereof and a database searcher 20 for acquiring data of the 
WWW servers 61, 62, . . . , 6n on the basis of the directory information of the 
directory information provider 30 and displaying it to a user. The directory 
information provider 30 provides scenario information 4 0 composed of main 
information presenting sequence control information 41 for arranging position 
information of a main information to be output/displayed to a user of the database 
searcher 20 on a time axis, secondary information presenting sequence control 
information 42 for inserting secondary information into an output/display of the 
main information and a presenting frequency control and information 43 for 
controlling the insertion timing of the secondary information for controlling 
insertion timing of the secondary information into the main information. The 
database searcher 20 comprises a WWW browser 21 and a browser control means as 
means for inserting the secondary information into the main information based on 
the scenario information 40 of the directory information provider 30. 

Detailed Description Text (5) : 

FIG. 4 shows an example of the construction of the directory information provider 
30 which includes a communication unit 31 for communicating with the network, an 
insert unit 32 for inserting the secondary information into the main information 
and supplying the main information together with the secondary information to the 
communication unit 31 and a memory unit for storing the scenario information 40. 
The scenario information 4 0 is composed of an information position information 
(URL) and time information and includes, as information of a presenting scenario of 
search information, the main information presenting sequence control information 
41, the secondary information presenting sequence control information 42 and the 
presenting frequency control information 43. The main information presenting 
sequence control information 41 takes in the form of a main information described 
with the position information (URL) of the main information and the time 
information and arranged on a time axis. Similarly, the secondary information 
presenting sequence control information 42 takes in the form of secondary 
information described with the position information (URL) of the secondary 
information and the time information and arranged on a time axis. 

Detailed Description Text (16) : 

First, the user down-loads the scenario information 40 by accessing the directory 
information provider 30 from the database searcher 20 and stores the scenario 
information in its own memory as a scenario information 26. When information is to 
be displayed by performing the database search, the user acquires the position 
information of the main information and the secondary information by analyzing the 
scenario information 26 by a scenario analyzer 24 of a browser controller 22, 
acquires a search information (file) from a WWW server 61, 62 . . . , or 6n and 
outputs the main information through a communication device 23 to a WWW browser 21 
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while inserting the secondary information into the main information. The WWW 
browser 21 displays the information retrieved under control of the browser 
controller 22. A timer 25 is used for the time control of information to be 
displayed and the time control when the secondary information is inserted into the 
main information every predetermined time. The user can read the main information 
inserted with the secondary information every predetermined time by the WWW browser 
21. Incidentally, in a case where the secondary information is inserted into the 
main information correspondingly to the number of display pages of the main 
information, the number of pages of the main information is counted by providing a 
page counter. Although, in the described embodiment, the database search is 
performed in the Internet as the network, the present invention can be applied to 
other networks. For example, the present invention can be applied to the database 
search/display in a network which is constructed with a plurality of servers and a 
client connected to each other through a transmission line and databases 
distributed in the servers, a network in a specific enterprise or other closed 
networks . 

Detailed Description Text (22) : 

Therefore, it is possible to insert the advertisement, etc., in the WWW server of 
the Internet as the secondary information and the database provider can expect the 
advertising revenue. 

Current US Cross Reference Classification (1) : 
705/14 

Current US Cross Reference Classification (4 ) : 
709/232 

CLAIMS : 

i 

6. A database search system comprising: 

a World Wide Web database provider connected to the Internet; 

a directory information provider for retrieving position information of data on the 
Internet and providing directory information thereof; and 

a database searcher for acquiring data of said database provider and displaying the 
data of said database provider to a user on the basis of the directory information 
of said directory information provider, 

said directory information provider comprising means for providing scenario 
information comprised of main information presenting sequence control information 
for arranging position information of the main information to be output/displayed 
to a user of the database searcher on a time axis, secondary information presenting 
sequence control information for inserting secondary information into the main 
information and outputting/displaying it and presenting control information for 
controlling insertion timing of the secondary information into the main 
information, 

said database searcher comprising means for inserting the secondary information 
into the main information on the basis of the scenario information of said 
directory information provider and outputting/displaying the secondary information, 

wherein the secondary information is advertisement information inserted into said 
database providing server and the secondary information is set such that the 
display inhibition on the side of the user is invalidated. 
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